Skip to content

Update parameters to build the suffix-array so we can also use the mmap option#2

Open
tibvdm wants to merge 2 commits intomainfrom
feature/speedup-loading-index
Open

Update parameters to build the suffix-array so we can also use the mmap option#2
tibvdm wants to merge 2 commits intomainfrom
feature/speedup-loading-index

Conversation

@tibvdm
Copy link
Copy Markdown
Contributor

@tibvdm tibvdm commented Mar 30, 2026

Important

This PR can only be merged, when all other PR's in other repositories are merged:
unipept/unipept-index#34
unipept/unipept-api#89

Before merging, the -b feature/speedup-loading-index has to be removed. Right now we clone a separate branch for testing purposes

This pull request updates the update_uniprot.sh script to improve reliability and compatibility with new features. The changes include updating command-line arguments for suffix array generation, and making minor corrections in script execution.

Suffix array and index improvements:

  • The suffix array builder command is updated to use new output parameters: --output-sa, --output-proteins, and --output-mapping, generating multiple binary files instead of just one.

Dependency management:

  • The check for the build-essential package has been commented out, possibly to avoid unnecessary installation steps or because it is no longer required.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the UniProt update automation to align with upcoming suffix-array builder changes (multiple output binaries for mmap support) and makes a few small reliability/housekeeping adjustments.

Changes:

  • Update sa-builder invocation to emit sa.bin, proteins.bin, and mapping.bin.
  • Fix script path invocations by removing an unintended leading / before ${SCRATCH_DIR}.
  • Adjust dependency checks and ignore common local output directories in git.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 2 comments.

File Description
scripts/update-uniprot/update_uniprot.sh Updates suffix-array build command/outputs and tweaks repo cloning + dependency checks.
.gitignore Ignores root-level data and home paths (likely local/generated).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/update-uniprot/update_uniprot.sh
Comment thread scripts/update-uniprot/update_uniprot.sh Outdated
@tibvdm tibvdm marked this pull request as ready for review April 1, 2026 11:41
Copy link
Copy Markdown

@SimonVandeVyver SimonVandeVyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script works with the mmap feature, but don't forget to change the branches in the dependencies.


# Download new version of the index repo
git clone --quiet "https://github.com/unipept/unipept-index.git" "${SCRATCH_DIR:?}/unipept-index"
git clone -b feature/speedup-loading-index --quiet "https://github.com/unipept/unipept-index.git" "${SCRATCH_DIR:?}/unipept-index"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script is still using the feature branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants